Stratified sampling

teams <- c(rep("yankees", 40), rep("padres", 40), rep("mariners", 40), rep("dodgers", 40))
salary <- c(rnorm(40, mean = 25), rnorm(40, 16), rnorm(40, 23), rnorm(40, 15))
df <- data.frame(teams, salary)
head(df)
##     teams   salary
## 1 yankees 23.32796
## 2 yankees 24.14309
## 3 yankees 25.77968
## 4 yankees 24.08930
## 5 yankees 24.52521
## 6 yankees 23.30517

Population view

## Warning: replacing previous import by 'grid::arrow' when loading 'mosaic'
## Warning: replacing previous import by 'grid::unit' when loading 'mosaic'

SRS vs Stratified

# population mean
mean(~salary, data = df)
## [1] 19.66963
# SRS
mean(~salary, data = sample(df, 40))
## [1] 19.50508
# Stratified sample
strat.samp <- rbind(sample(subset(df, teams == "yankees"), 10), sample(subset(df,
    teams == "padres"), 10), sample(subset(df, teams == "mariners"), 10), sample(subset(df,
    teams == "dodgers"), 10))
mean(~salary, data = strat.samp)
## [1] 19.46597

Long-run performance

corrs1

corrs2

If you learn one thing in this class…

Principles of Experimental Design

Control: Compare treatment of interest to a control group.

Randomization: Randomly assign subjects to treatments.

Replication: Within a study, replicate by collecting a sufficiently large sample. Or replicate the entire study.

Blocking: If there are variables that are known or suspected to affect the response variable, first group subjects into blocks based on these variables, and then randomize cases within each block to treatment groups.

Replication

psych

Blocking

A study is designed to test the effect of light level and noise level on exam performance of students. The researcher also believes that light and noise levels might have different effects on males and females, so wants to make sure both genders are represented equally under different conditions. Which of the below is correct?

  1. There are 3 explanatory variables (light, noise, gender) and 1 response variable (exam performance)
  2. There are 2 explanatory vars (light and noise), 1 blocking var (gender), and 1 response var (exam performance)
  3. There is 1 explanatory var (gender) and 3 response vars (light, noise, exam performance)
  4. There are 2 blocking vars (light and noise), 1 explanatory var (gender), and 1 response var (exam performance)

Other key ideas

Placebo: fake treatment, often used as the control group for medical studies

Placebo effect: experimental units showing improvement simply because they believe they are receiving a special treatment

Blinding: when experimental units do not know whether they are in the control or treatment group

Double-blind: when both the experimental units and the researchers do not know who is in the control and who is in the treatment group

Consider acupuncture

acupuncture

How do you test if acupuncture reduces pain?

"Sham acupuncture" is a good control.

Practice

Practice

  1. Find your numerical pair
  2. Introduce yourself (name, year, major, hometown)
  3. Discuss the problems on the handout and record your thoughts.